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We undertake a detailed study of the performance of maximum likelihood (ML) estimators of the 
density matrix of finite-dimensional quantum systems, in order to interrogate generic properties of 
frequentist quantum state estimation. Existing literature on frequentist quantum estimation has 
not rigorously examined the finite sample performance of the estimators and associated methods 
of hypothesis testing. While ML is usually preferred on the basis of its asymptotic properties - it 
achieves the Cramer-Rao (CR) lower bound - the finite sample properties are often less than optimal. 
We compare the asymptotic and finite-sample properties of the ML estimators and test statistics 
for two different choices of measurement bases: the average case optimal or mutually unbiased bases 
(MUB) and a representative set of suboptimal bases, for spin-1/2 and spin-1 systems. We show 
that, in both cases, the asymptotic standard errors of the ML estimators grossly underestimate the 
estimation error in finite samples, rendering inference based on the asymptotic properties of the 
ML unreliable and misleading for experimentally realistic sample sizes. The results indicate that in 
order to fully exploit the information geometry of quantum states and achieve smaller reconstruction 
errors, the use of Bayesian state reconstruction methods - which, unlike frequentist methods, do not 
rely on asymptotic properties - is necessary, since the estimation error is typically lower due to the 
incorporation of prior knowledge. 

PACS numbers: 05.30.-d, 02.50.-r, 03.65.Wj, 03.67.-a 



I. INTRODUCTION 



Perhaps the most fundamental problem in quantum 
statistical inference (QSI) is the reconstruction of the 
density matrix of a quantum system on the basis of 
a finite number of quantum observations. Due to the 
rapidly growing interest in quantum computation and 
quantum control, the ability to retrieve the maximum 
amount of information about a quantum state based on 
the smallest number of measurements is a subject of 
paramount importance. The accuracy of all derivative 
forms of QSI, including process estimation, is ultimately 
determined by that of the underlying state estimation. 

Methods for quantum state estimation can be for- 
mally subdivided into three categories. The first, tomo- 
graphic inversion f]] , is the least computationally expen- 
sive and most popular technique. However, tomographic 
inversion cannot enforce the constraints on the density 
matrix during estimation, and hence the estimates pro- 
duced are often not physically meaningful. The second 
class consists of frequentist techniques of inference based 
on a likelihood function, the most notable of which is 
maximum likelihood (ML) estimation 0. This class of 
methods avoids the problems associated with tomogra- 
phy and is asymptotically more efficient than the latter 



(as explained in Section III E). However, it delivers dis- 
tributional results for the estimators of the parameters 
of interest under the assumption of an infinite number of 
measurements. Hence, for finite sample sizes, the esti- 
mated confidence intervals for the parameters may have 
actual coverages quite different from the corresponding 
asymptotic ones and, as is well known in the statis- 
tics literature, this divergence typically tends to become 
more pronounced as the number of parameters and non- 
linearity of the model increases. All forms of frequentist 
inference, including tomographic inversion and ML esti- 
mation, require a complete observation level, i.e. iV^ — 1 
linearly independent observable operators where N is 
the Hilbert space dimension, in order to estimate all 
the parameters. The third type, Bayesian estimation 
^,[4, 5, 6, 7], which is based on updating a prior plausi- 
bility distribution of the parameters based on observed 
data, lends itself readily to both incomplete observation 
levels and a finite number of measurements. Moreover, 
the estimation error that arises in Bayesian methods is 
typically lower, reflected in shorter lengths of Bayesian 
confidence intervals compared to their frequentist coun- 
terparts, due to the use of such priors ^. 



*EIectronic address: |rajchak@princeton.edu| 



^ The principle of entropy maximization (PEM) is an estimation 
methodology that can consistently estimate all parameters with 
an incomplete observation level, since it implicitly assumes a 
prior plausibility distribution over the parameter space. How- 
ever, it has been shown that the von Neumann entropy employed 
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In this paper, we investigate the finite sample prop- 
erties of ML estimators of the density matrix of finite- 
dimensional, spin- 1/2 (one qubit) and spin-1 quantum 
systems. Among frequentist estimation techniques, ML 
is usually preferred on the basis of its asymptotic prop- 
erties - (i) the ML estimator is asymptotically efficient 
in the sense that its asymptotic variance achieves the 
Cramer-Rao lower bound for consistent estimators, (ii) 
likelihood based testing approaches are optimal, in the 
sense of the Neyman- Pearson Fundamental Lemma and 
the Large Deviation Principle, for a broad class of hy- 
pothesis testing problems. However, the finite sample 
properties of ML estimators and test statistics are of- 
ten less than optimal. Existing literature on quantum 
ML has not rigorously examined the finite sample per- 
formance of the estimators and associated methods of 
hypothesis testing. To our knowledge, there is only one 
extant study on the efficiency of frequentist quantum 
state estimation Q , and robust numerical techniques are 
lacking. Minimizing finite sample estimation errors is 
essential for making optimal quantum decisions, which 
underlie emerging quantum feedback control and com- 
putation strategies Q. Lack of rigorous understanding 
of the small sample estimation errors has inhibited the 
application of ML to practical problems in quantum in- 
formation and control. 

We shed light on the following issues: 

• How are the small sample biases of the ML esti- 
mators affected by the sample size? 

• How do the finite-sample standard errors, and 
hence the associated 95% confidence intervals, 
compare to the corresponding asymptotic ones and 
how does the coverage of the intervals change with 
increase in the sample size? 

• How does the small sample behavior of the test 
statistics for physical quantities of interest in 
quantum decision theory compare to their known 
asymptotic behavior for different sample sizes? 

We show that the finite sample properties of ML differ 
significantly from the corresponding asymptotic ones for 
experimentally realistic sample sizes. In particular, the 
finite-sample standard errors are orders of magnitude 
bigger than the corresponding asymptotic ones. This 
feature holds for sample sizes approaching the limit of 
experimental feasibility. Consequently, the asymptotic 
confidence intervals grossly undercover in finite samples 
and the test statistics exhibit severe size distortions. 

In addition, the assessment of the estimation error in 
QSI is complicated by the existence of multiple mea- 
surement strategies due to the noncommutativity of the 



in PEM is not the appropriate measure of information-theoretic 
entropy; hence, we ignore it here. 



probability space, and ambiguities regarding the opti- 
mal measurement strategy. We compare the relative 
efficiencies of average-case optimal (MUB) and repre- 
sentative suboptimal measurement strategies. We show 
that the asymptotically predicted advantages of optimal 
measurements are diminished in finite samples, to the 
extent that measurement strategies that are experimen- 
tally simpler to implement may perform almost as well 
as asymptotically optimal measurements. 

The paper is organized as follows. Section II discusses 
the asymptotic properties of the ML and compares it to 
the alternative estimation approaches. Section HI de- 
tails the properties of ML estimators of the quantum 
density matrix. Section IV describes the Bloch vec- 
tor parameterization, mutually unbiased measurement 
bases (MUB), and representative suboptimal measure- 
ment strategies considered in this paper. Section V dis- 
cusses the details of the globally convergent Newton- 
Raphson and quasi-Newton algorithms used for con- 
strained parameter optimization, along with methods for 
kernel density estimation of finite sample distributions. 
In Section VI, we present the estimation results for gen- 
eral mixed spin-1/2 and spin-1 density matrices, com- 
paring the finite sample versus asymptotic properties of 
estimators and test statistics for physical quantities of 
interest in quantum decision theory. Finally, in the con- 
cluding Section VII, we draw conclusions regarding the 
efficiency of frequentist quantum state estimation and 
discuss Bayesian extensions. 



II. PROPERTIES OF FREQUENTIST AND 
BAYESIAN ESTIMATORS 

A. Maximum Likelihood Estimators 

Let X — (xi,--- ,Xm) be an i.i.d. sample of size 
TO from a population with probability density function 
p(x|0), which depends on the unknown parameter vec- 
tor whose true value is 9q. The value of the parameter 
vector that maximizes the likelihood function - the joint 
density of the sample defined as a function of the un- 
known parameter vector 9 - \s called the ML estimator 
of 61: 

^ML = arg max L{0\x) 

= argmax TTp(a;i|6')---p(a;„|6') , 

\r=i / 

where Q denotes the admissible parameter space. Typi- 
cally, the logarithm of the likelihood function, \n.L(d\x), 
is easier to maximize numerically because of its separa- 
bility. By maximizing the log likelihood, the ML esti- 
mator minimizes the Kullback-Leibler distance between 
the estimated and true probability distributions. 

ML has several properties that make it an attractive 
frequentist estimation procedure: 
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1. Consistency: An estimator 0™ is consistent for the 
parameter 9 (written as phm 0™ = ^o) if for every 
e > 0, 



hm Pe 



{l^^-^ol >e} =0. 



The ML estimator is consistent: phm 6 



ML 



2. Invariance: The ML estimator of c{6) is c(0^}^), 
for a continuous and continuously differentiable 
function c(-). 



oo, 



Asymptotic Normality. For a sequence of estima- 
tors 6"^, if km (O"" - 9o) N{0, E) as m 

where denotes convergence in distribution and 
km is any function of m, 0™ is said to be \fk^- 
consistent for 9 and has an asymptotic normal dis- 
tribution with asymptotic covariance matrix S. 

The ML estimator is asymptotically normally dis- 
tributed: 



0o] ^ AA[O,/-i(0o)] 



/m {QZl - f^o 

where I{9o) = -E 



lnL{9o 
8989' 



X) 



I{9o) is called the expected Fisher information ma- 
trix. Note that the asymptotic covariance matrix 
of the ML estimator is a function of the unknown 
parameters. Two approaches exist for consistent 
estimation of the expected Fisher information ma- 
trix thereby providing feasible versions of the ob- 
served Fisher information matrix. The first es- 
timator replaces the expected second derivatives 
matrix of the log likelihood function with its sam- 
ple mean evaluated at the maximum likelihood es- 
timates, 



8^ \nL{9 



ml\ 



8989' 



(1) 



The second estimator is based on the result that 
the expected second derivatives matrix is the co- 
variance matrix of the first derivatives vector, 



^ML ) 



a In L (61 



ML 
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4. Asymptotically efficient. A sequence of consis- 
tent estimators 6'™ is asymptotically efficient if 
[9-"^ - 0o] ^ N[QJ-^{9o)] where I{9) = 



-E 



\nL{e\x) 



[m/(0o)] ^ is called the Cramer- 
Rao lower bound (CRB) for consistent estimators. 

Property 4 is the subject of the following classic 
lemma of frequentist inference. 



Lemma 1 The eigenvalues of the covariance matrix 
of parameter estimates of an asymptotically unbiased 
frequentist estimator are hounded from below by the 

eigenvalues of {ml{9ff))^^ = |— mE ^ 'aeae"^^"* | 
The maximum likelihood estimator asymptotically (i.e., 
in the limit of an infinite number of measurements) 
achieves this lower bound. 

In addition, likelihood based testing approaches are 
optimal, in the sense of the Neyman-Pearson Funda- 
mental Lemma and the Large Deviation Principle, for a 
broad class of hypothesis testing problems. A hypothesis 
test T, based on a test statistic W{x) - a function of the 
data - is a rule that specifies for which values of x (the 
acceptance region A) the null hypothesis Hq : 9 € is 
accepted, and for which values (the rejection region R) it 
is rejected (and the alternative hypothesis TJi : 6* e 6g, 
where is the complement of 0o, is accepted). 

The size s{T) of a hypothesis test T is the probabil- 
ity of rejecting the null hypothesis given that it is true. 
The power piT) of a hypothesis test T is given by the 
probability of rejecting the null hypothesis given that it 
is false. Typically, when defining the power of a test, 
one assigns the test to a class based on its size. For 
< a < 1 a test with power function (3{9) is a size a 
test if supgg0^/3(0) = a. 

Definition 1 Given a class of hypothesis tests for test- 
ing Ho : 9 G Qo versus Hi : 9 E where U 8g — Q, 
the admissible parameter space, a test in that class with 
power function P{9) is uniformly most powerful (UMP) 
ifP{9) > f3'{9) for every (3' {9) in that class. 

An important type of hypothesis test based on ML is 
the likelihood ratio test. Likelihood ratio test statistics 
take the form 



\{x) 



L{9 



ML\ 



L{9 



ML\ 



where is the constrained ML estimator. It can 

be shown [lol [ll| that likelihood ratio tests are UMP 
in their respective classes. In our simulation analysis, 
we rely on an alternative testing procedure, namely the 
Wald test, which inherits the optimality properties of 
the likelihood ratio test on account of their asymptotic 
equivalence [T^. This choice is made primarily on the 
basis of the ease of computation. The likelihood ra- 
tio test requires calculation of both restricted and unre- 
stricted estimators. The Wald test, on the other hand, 
requires only the unrestricted estimator. Since some of 
our hypothesis tests involve nonlinear constraints and 
estimation of the constrained model is cumbersome, we 
rely on the Wald testing procedure. 

Because of properties 1-4 and the fact that likelihood 
ratio tests are UMP, the maximum likelihood estimation 
methodology is considered the most desirable among fre- 
quentist estimation techniques. 
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B. Method of Moments Estimators 

An alternative method of frequentist inference, which 
may be used to estimate the parameters of the quan- 
tum density matrix, is the Method of Moments ap- 
proach. Suppose that although the probability den- 
sity hmction p{xi\6) is unknown, n moments of the 
density function have analytical representations in 
terms of the parameter vector 6. Let these n pop- 
ulation moments be denoted by E [/(xi)] = 

where f{xi) = /2(a;i), /„(xi)) and n{0) = 

{lii{0), 112(0),..., iin(9))^ . The Method of Moments 
(MM) Estimator of 9, denoted 0^^^, is the value of the 
parameter vector that equates the population moments 
with the corresponding sample moments: 



III' . , 
1=1 

Note that the Method of Moments approach involves 
solving a system of n (possibly nonlinear) equations in n 
unknown parameters. Like the ML estimator, the MM 
estimator is consistent and asymptotically normally dis- 
tributed. However, while the ML estimator exploits all 
the information contained in the likelihood of the data, 
the MM estimator only uses the information in a cho- 
sen set of moments of the data. Hence, unlike the ML 
estimator, the MM estimator is not asymptotically ef- 
ficient, i.e. its asymptotic variance does not attain the 
Cramer-Rao lower bound. The asymptotic distribution 
of the MM estimaor is: 



where 



m KM-^o]-^[o,y], 



V = D-^n{D-^y , 

D 

O = E (/(x,) - M^o)) (/(a:.) - M^o))^ 



E 



d{f{xi)-i,{e^)) 

89 



C. Bayesian Estimators 

In the alternative paradigm of Bayesian estimation, 
the estimation error that arises is typically lower than 
that in frequentist estimation. Bayesian estimation dif- 
fers fundamentally from frequentist methods in that the 
parameters 9i are treated as random variables. The 
goal is not to estimate a unique probability distribution, 
which can only truly be known in the limit of an infinite 
number of measurements, but to update a so-called prior 
plausibility distribution to a posterior plausibility distri- 
bution based (only) on the observed data. The posterior 
plausibility distribution is given by 



P 



xM) dO 



L{x\9)p{9\I) de 
J^Lix\9)p{9\I)d9' 



where p{9 \ I) denotes the prior plausibility distribu- 
tion, i.e., the probability of the parameter vector taking 
on the value 9 given our prior information / regarding 

the parameter space, L (x \ 9) denotes the joint prob- 
ability density, and Q denotes the space of admissible 
parameters 9. 

Conditional simulation is required to retrieve quanti- 
ties of interest, including the parameter estimates, which 
are given formally by the posterior means 

. ^ jQ9ip{9\xAl) d9 
' jQp{9\xAl)d9 • 

Unlike frequentist estimators, the notion of a confi- 
dence or credible interval can be rigorously defined for 
finite samples only for Bayesian estimators. This allows 
one to rigorously report finite sample uncertainties. The 
100 * c% Bayesian credible interval for the parameter 9i 
is the interval [a, b] such that 



p {9\xAl) d9i,---d9i--- d9„ = c. 



III. ESTIMATION OF THE DENSITY MATRIX 



Quantum estimation and the likelihood 
function for state reconstruction 



In this section we apply and extend the classical es- 
timation framework in Section II to estimation of the 
quantum density matrix. Quantum statistical inference 
is based on the notion of a quantum probability space. 

Definition 2 Consider a measurable space (X)^)) 
where x the set of all possible measurement outcomes 
and A is the a-Algebra of subsets of x- ^n- operator- 
valued probability measure (POVM) is a (set) function 
M : A -> B{n), where B{n) is the set of bounded 
positive semidefinite, Hermitian linear operators on a 
Hilbert space H. 

Definition 3 A quantum probability space is a mea- 
surable space (x,.4), together with an operator-valued 
probability measure M, such that the outcome x € x 
has probability density function p{x\0) = Tv{p{0)F{x)), 
where F{x) € Ti and p{0) is a positive-semMefi,nite, unit 
trace, Hermitian matrix (parametrized by a vector 9 of 
parameters) called the density matrix. 

Note that x in this context denotes the outcome of 

a single measurement. The measure M is explicitly 
defined in terms of the operators F(x), and an as- 
sociated scalar-valued probability measure /x satisfying 

/i(x) = 1, through the relation M{A) = F{x) fi{dx). 
For A^-dimensional (finite) quantum systems, we write 



(2) 



F{x,) = Fi, 



, N — 1, and denote the outcome 



of the fc-th measurement by Fi^. Fj is then a N x N 
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positive-semidefinite Hermitian matrix. The outcomes 
X are indexed by the set of integers (1, • • • , N"^ — !)■ In 
this case, we have /i(x) = ■^Tr(M(x)) = 1- 

For simphcity of exposition, we collect all the distinct 
parameters of the density matrix, p, into the (TV^ — 1)- 
dimcnsional vector, 9. The most convenient parame- 
terization of p{6) differs based on the state estimation 
method; various parameterizations are discussed in Sec- 
tion IV. The likelihood function for quantum state esti- 
mation is then 



L{e\x) = \[T^{p{e)E,,) 



(3) 



k=l 



which may be interpreted as the probability of obtaining 
the set of observed outcomes for a given density matrix 
p{0)- The ML estimator of the density matrix seeks to 
identify the admissible parameter vector 9 at which this 
likelihood is maximal. 



B. Quantum measurement bases 

A resolution of the identity on a Hilbert space TL of 
quantum states is a normalized operator-valued mea- 
sure. Generally, the resolution of the identity satisfies 

Mix) = f F{x)fi{dx) = I. 

For finite-dimensional systems, to which we restrict our 
attention, 



'^Fipixi) = /at, 



where /jv denotes the N x N identity matrix. 

An important feature of quantum probability is that 
the operators Fi do not all mutually commute. The sub- 
sets A G A oi the space of possible measurement out- 
comes X uiay be chosen to be pairwise disjoint and asso- 
ciated with subsets Ma = {Fi, • • • , Fi^^-i} of commut- 
ing observables whose members do not commute with 
those of any other subset. The Ma are then said to 
constitute distinct measurement "bases". 

Writing each Fi as an iV x iV Hermitian matrix, it 
is convenient to represent each basis M^(r) in terms of 
a,n N X N matrix of common eigenvectors V^^\ < 
r < N. Given that the density matrix is a function of 
— 1 independent parameters, the minimal cardinality 
resolution of the identity must be composed of -t- 1 
subsets A £ A. We note, however, that many resolutions 
of the identity are redundant in that they are associated 
with n > A^ + 1 bases Ma- 

In the current work, the data x consist of rrii measure- 
ment outcomes in each of p measurement bases with 
X]r=i rrii = m. The measurement bases used are dis- 
cussed further in Section [IVBI 



C. Quantum maximum likelihood estimation 

Among frequentist estimation techniques, ML has 
been employed most extensively for reconstruction of 
quantum states. In quantum ML estimation, we aim to 
identify the maximum of the likelihood function ^ over 
the set of admissible density matrices. All paramctriza- 
tions of the density matrix require the imposition of con- 
straints on parameter vector 9 (see Section IV); these 
constraints are necessary for expression (jH]) to be a well- 
defined likelihood. Assuming the constraints on the pa- 
rameter vector 9 are of the general form aj{9) > 0, j = 
1, • • • , A^, the problem can be formulated in terms of the 
Lagrangian function 



C{9,X,j\x) = In 



.fe=i 



JV 



{a, (9) 



(4) 

where the first term is lnL(6'|a;) in the absence of con- 
straints (i.e., p{9) need not be an admissible density ma- 
trix and L need not be a well-defined likelihood), the jj 
denote slack variables (7^ = in the case of an equality 
constraint) and the A-,- denote Lagrange multipliers. It is 
convenient to order the N constraints such that the first 
constraint enforces the unit trace of p, and the following 
A^ — 1 constraints enforce its positive semidefiniteness. 
Note that L {{9} \ x) is a well-defined likelihood function 
only in the presence of these constraints. For parameter- 
izations where positive semidefiniteness is implicit in the 
arametrization (such as the Cholesky parametrization 
), Aj = 0, J = 2, • • • , A^, and for parameterizations 
where the unit trace constraint is implicit in the param- 
eterization (such as the Bloch vector parametrization. 
Sect. IV), Ai — 0. We denote the vector of parameters 
(6*, A, 7) = t. Finding the optimum corresponding to 
this Lagrangian entails searching for parameter vector 
t that renders the gradient vectors VL{9) and a lin- 
ear combination of W{aj{9) — jj), j — 1,...,N parallel. 
There are two common approaches to solving this prob- 
lem: 1) minimization of the "sum of squares" (of the 

first-order conditions) function (fr) ' ^) finding the 

roots of the system of nonlinear equations §f- = using 
the Newton-Raphson (NR) method. In fact, methods 
1) and 2) may be combined to produce a globally con- 
vergent NR algorithm. Further details on solving the 
constrained optimization problem are provided in Sec- 
tion El 

Note that the ML estimator obtained by maximiz- 
ing the likelihood defined by the Lagrangian ^ is con- 
sistent, asymptotically normally distributed, and has 
an asymptotic covariance matrix equal to the inverse 
of m times the expected Fisher information matrix, 
{mI{9o))~^ (see Section II for details). We estimate 
the expected Fisher information using equation ([1]) in 
Section II. 
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D. Asymptotic properties of quantum maximum 
liltelihood estimators 



In quantum statistics, there are multiple Cramer-Rao 
type inequalities, each with its own associated (quan- 
tum) Fisher information. Some of these correspond to 
particular measurement strategies, whereas others are in 
fact unachievable. Work in quantum probability theory 
[1] has indicated that ^/(6'o)~^ for an arbitrary choice of 
measurement bases is generally not the tightest asymp- 
totic lower bound achievable in quantum ML estimation. 
However, the measurements that maximize the Fisher 
information depend on the true, unknown state of the 
quantum system, rendering the practical utility of the 
notion of the tightest possible Cramer-Rao bound ques- 
tionable. 

Although the choice of measurement bases that can 
achieve the tightest possible Cramer- Rao bound depends 
on the true p, there exists an approach to optimal mea- 
surement that is agnostic to the true value of p. Woot- 
ters proposed a construction of measurement bases 
that maximizes the average information (over the set 
of all possible density matrices) obtained via a set of 
m measurements. These so-called mutually unbiased 
measurement bases (MUB) are "maximally noncommu- 
tative" in the sense that a measurement in one basis 
provides no information as to the outcome of a measure- 
ment over a basis unbiased with respect to the current 
one. Let I{po) denote the Fisher information given true 
state po = p{&o)- MUB aims to maximize the average 
Fisher information over all possible po's: 

1 



{im = TT / I{0,Po)dpo, 
^0 Je 

where O again denotes the admissible parameter space 
and Vq is the volume of O, by an appropriate choice 
of measurement bases (as discussed in Section llVi 9 — 
Bf^2_i, the N'^ — 1 dimensional Bloch vector space of 
admissible N x N dimensional density matrices) . It can 
be shown that this is equivalent to maximizing the av- 
erage Kullback-Leibler (KL) information gain (D) upon 
updating the flat prior distribution to the asymptotic 
multivariate normal distribution ^. The KL information 
gain is given by 



D 



e ' g{e) 



where in the present case f{9) is the asymptotic multi- 
variate normal distribution over the Bloch vector space 
after measurements and estimation, and g{0) is the uni- 
form distribution on the Bloch vector space. The KL 



gain is deflned in terms of the Shannon information of 
a distribution, Eg{lnf{§)) = f {()) In f {9)d§ . A mea- 
surement in each basis restricts the variance in — 1 di- 
rections, leaving it infinitely broad in the other A^^ — A^. 
For estimation of a single parameter, the Shannon en- 
tropy of the asymptotic normal distribution of parame- 
ter estimates is — ^ ln(7re) — In(cr). 

Maximizing the information gain is equivalent to mini- 
mizing the "uncertainty volume" in the parameter space; 
in the absence of measurements, this is equal to the vol- 
ume of the Bloch vector space. The uncertainty distance 
for estimation of a single parameter is the standard de- 
viation of the estimator; the uncertainty volume is the 
product of the standard deviations of the estimators for 
each of the parameters. In terms of uncertainty volumes, 
the information gained by measurements is 



ln(7re). 



where W is the uncertainty volume after measurements 
and estimation and Wq is volume of the Bloch vector 
space. 

Denote by Tr the (A^ — l)-dimensional subspace of 
su{N) or Bpf2_i associated with measurement basis 
]/('")_ xhe total uncertainty volume is diminished by 
overlaps between the subspaces Tr. It can be shown 
that this total volume W may be written W = 
— '^---^w+i where T„ is the (A^ — l)-dimensional sub- 

vol(Ti ,Tn + i) ' ^ ' 

space of su{N) associated with measurement basis y^''-' 
and (Ti,-- - ,Tn+i) denotes the (A^^ — l)-dimensional 
parallelipiped whose edges are the A^ -I- 1 sets of eigen- 
vectors associated with each of the subspaces T^. Thus 
the Kullback-Leibler information gain (IIII D|) in updat- 
ing the flat prior distribution to the asymptotically nor- 
mal distribution is then (D) ~ ^ ^^=1 (^''^i^r)) + 
ln[vol(Ti, • • • , Tn+i)] + In(VKo) - (^) ln(7re). 

{\n{Wr)), the log of the average uncertainty volume in 
subspace Tr, does not depend on the choice of measure- 
ments since it is the log of the product of standard devia- 
tions of multinomial parameters in a single measurement 
basis, averaged over all possible multinomial parameters 
Pi, • • • ,pn. Thus the average Kullback-Leibler or Fisher 
information is a function of only vol(ri,--- ,Tn^i), 
which in turn is determined by the relative orientations 
of the bases (and not the parameters). The total un- 
certainty volume is minimized when (Ti, • • • , T/v-t-i) is a 
rectangular solid with all the unit vectors defining the 
edges being orthogonal; this is equivalent to the condi- 
tion that the subspaces Ti, • • •.Tat+i are mutually or- 
thogonal. Wootters showed [iJl that this condition is 
equivalent to requiring that 

|(vW,vf))| = 4=, (5) 



^ In frequentist statistics, the relative entropy is always defined 
in terms of the passage from the flat plausibility distribution to 
an asymptotically (multivariate) normal distribution. 



A 

(r) (r') 

where , are column vectors in the bases 

]/('") respectively, and |(-,-)| denotes the modu- 

lus of the Hermitian inner product. Whereas mutual 
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nonorthogonality of the edges of the parallelipiped may 
decrease the asymptotic uncertainty volume in particu- 
lar subspaces T^, the total asymptotic uncertainty vol- 
ume is always increased by such nonorthogonality. Ex- 
plicit formulas for measurement bases that satisfy (O 
are known in the cases where the Hilbert space dimen- 
sion N is the power of a prime, and are discussed in 
Section IVB. 

An unresolved question in the literature is the mag- 
nitude of the information loss for finite sample sizes in- 
curred due to not using MUB or other approaches to 
optimal quantum measurement. In many experimen- 
tal setups, it is not convenient to use these specialized 
bases. We aim to clarify the practical utility of optimal 
quantum measurements in ML estimation, and assess 
the extent to which frequentist quantum estimation can 
effectively make use of the associated optimal efficien- 
cies. 



E. Estimation of the density matrix by 
tomographic inversion 

An alternative frequentist method of quantum state 
estimation that is not based on a likelihood function is 
tomographic inversion . This involves an application 
of the Method of Moments approach described in Sec- 
tion II. Adopting the notation Xi^ for the k — th obser- 
vation returning outcome i, let the MM estimator func- 



tions be given by /^(xjj 



We then have 



E [fj{xi^)] = = Tv{p{9)F{xi^)), where Pi denotes the 
probability of observing outcome F{xi^, ) = Fi^, . The 
tomographic inversion method estimates the parame- 
ters by equating these population moments to the cor- 
responding sample moments: the parameter estimates 
Omm,] ,1 < j < N"^ — 1 are obtained by inverting a 
system of equations of the form 



Tr(p(0)F,) = c„ 1 < J < TV' 



1, 



(6) 



where denotes the frequency with which outcome Fi is 
observed in the sample. Introducing the notation Aij = 
§^ELi^}£il^ ffjQ may solve for the estimated parameter 

vector as = A^^c for any parameterization p{d) that 
is linear in (see Section ITVl . 

However, this method has two major drawbacks. 
First, since no parametrization p(9) guarantees satisfac- 
tion of each of the positive-semidefiniteness, unit trace, 
and Hermiticity constraints on p (see Section irv|) . direct 
inversion can yield unphysical density matrix estimates. 
Second, while the ML estimator exploits all the infor- 
mation contained in the likelihood of the data, the MM 
estimator only uses the information in a chosen set of 
moments of the data. Hence, unlike the ML estima- 
tor, the MM estimator is not asymptotically efficient, 
i.e. its asymptotic variance does not attain the Cramer- 
Rao lower bound. For these reasons, we do not consider 
tomographic inversion in our assessment of the perfor- 
mance of frequentist quantum estimation. 



F. Bayesian versus frequentist density matrix 
estimation 

In the alternative, Bayesian approach to quantum 
state estimation, the posterior distribution p{0 | a; A /) 
(111) takes the form: 



p{9\xAl) dd 



L{x\e)pie\ I)d0 
J^L{x\9)p{d\I) de 

[UkT^T{F,,p{e))]p{e)d9 
IeiUk^^'iF^,pm]p{e) de- 



(7) 



(8) 



The density matrix can be estimated by the poste- 
rior mean of each of its elements (corresponding to a 
quadratic "loss function"), namely 



PxAl 



p{e) p{e\xM)de 



/ePW [Y{kMF^,p)]p{e) de 

/e[m.Tr(F,,p)]p(e) de 



(9) 

(10) 



Alternative loss functions can be used to retrieve other 
estimable quantities of interest. 

Bayesian credible intervals can be obtained according 
to expression pi C|) by sampling from the posterior den- 
sity (O. These credible intervals do not rely on asymp- 
totic results / Fisher information. We do not compute 
Bayesian integrals in this work; our goal is rather to 
determine whether such a need exists given the finite 
sample performance of the computationally simpler fre- 
quentist estimators. 



IV. DENSITY MATRIX PARAMETERIZATION 
AND CHOICE OF MEASUREMENT BASES 

A. Bloch vector parameterization 

In maximum likelihood estimation of the quantum 
density matrix, a constrained optimization must be car- 
ried out, where the constraints correspond to preserva- 
tion of the unit trace and/or positive semidefiniteness 
properties of the density matrix. The dimension of the 
parameter space increases quadratically with the Hilbert 
space dimension, necessitating the use of efficient pa- 
rameterizations of the density matrix. The three most 
commonly used parameterizations are the Bloch vector 
[H, Euler angle [T6| and the Cholesky parameteriza- 
tions. Within the last few years, considerable advance- 
ments have been made in extending the Bloch and Euler 
parameterizations to arbitrary A'^-dimensional Hilbert 
spaces. The Euler angle parameterization of the den- 
sity matrix, which is based on the Euler angle parame- 
terization of the special unitary group SU(N), employs 
the generators in the Lie algebra su{N) to parameter- 
ize p in terms of the Lie algebra exponential map. The 
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constraint equations are linear in the parameters, but 
because the parameters appear as exponents, the like- 
lihood takes on a complicated form. In the Cholesky 
parameterization, p = A'' A with A upper triangular has 
real elements on the diagonal, p is then is automatically 
positive-semidefinite, but is associated with a nonlinear 
expression for the likelihood, and is not standard in other 
apphcations. Here, we employ the so-called Bloch vector 
parameterization, where the probability of an observ- 
able outcome according to the Born rule, Tr{p{6)Fi), is 
a simple linear function of the parameter vector 9. More- 
over, the Bloch vector parameterization is perhaps the 
most commonly used in the statistical physics of finite- 
dimensional quantum systems (especially in quantum in- 
formation applications). Most importantly, asymptotic 
standard errors in the Bloch vector parameterization 
can be computed using the standard methods described 
in Section II because the positive-semidefiniteness con- 
straints are inequality restrictions that are nonbinding 
at the optimum. 

In the Bloch vector parameterization ^iB*! , the Hermi- 
tian operator p is parameterized in terms of an orthog- 
onal basis {Xj}, 1 < j < N"^ — 1 for the vector space 
of traceless Hermitian operators on an iV-dimensional 
Hilbert space. In two dimensions, these are the familiar 
Pauli spin matrices, whereas in three dimensions they 
are the so-called Gell-Mann matrices, p can then be 
written 

{Oi, ...,6i^2_i) = e Bf^2_i c , 

where the N"^ — 1 matrices Xj satisfy the conditions a) 

Xj = a], b) Tr(Aj) = 0, c) Tr(A,Aj) = 2%. These are 
the defining conditions of the generators of the Lie group 
SU{N) that generalize the Pauli spin matrices. The 9j 
are given by Oj{p) = Tr(Ajp) (i.e., are expectation values 
of the observable generators). The vector 9jXj is called 
the Bloch vector. 

Bpf2_i is a compact convex subset of Let 
ai{X) denote the coefficients of the characteristic poly- 
nomial of p, det{ylis[ — p), where p takes the form 

^ In the alternative Euler angle parameterization, neither the unit 
trace nor the positive semidefiniteness constraints are automat- 



above. The unit trace constraint is automatically sat- 
isfied in the Bloch vector parameterization It can be 
shown that the conditions of Hermiticity and positive- 
semidefiniteness of p correspond to the following defini- 
tion of the "Bloch vector set" Sjv^-i of admissible values 
of 61 

Bn2_j^ = {9 eR^'-^ I a,{9)>0, i = l,...,N}. 

(11) 

This follows from the standard result that the roots of a 
characteristic polynomial are positive semidefinitc iff the 
coefficients of the polynomial are positive semidefinite 
[3| • The Ui in the above definition of B are themselves 
polynomials in 9 whose coefficients can be expressed in 
terms of the structure constants of the Lie algebra su{N) 
of traceless Hermitian matrices, and will be written ex- 
plicitly below for N — 2,3,4. The structure constants, 
which characterize the generators of su{N), are the ele- 
ments of the completely antisymmetric and completely 
symmetric tensors / and g, respectively defined by the 
relations: 



[Ai, Xj] — 2ifijkXk 



[Xi,Xj]+ — —S-ijlN + "^gijkXk, 
which can be solved [IBl for fijk, gijk'- 

fijk = ^Tr{[Ai, Aj]Afc} 

1 4 
9^jk = -Tr{([A,, Aj]+ - — (5,j)Afc}. 

where [, ] denotes the antisymmetric commutator and 
[,]+ denotes the symmetric commutator, and where we 
have used the Einstein (implicit) summation convention 
for repeated indices. It can be shown [l^ that the Xi 
that satisfy these conditions can be expressed: 

{\}ti' = {MAv,k}Am}} 

where 

ically satisfied. 



{u,k} = { \j){k\ + \k){j\ I l<3<k<N] 
{v,k) = { ^{\j){k\ + \k)m I l<3<k<N]; 



I 

1. Spin-1/2 systems 2\a2 = ^^^—^ - ]^\9\^ > 0. (12) 

When N = 2, the conditions Oi > (equation (fTT|) ) 
correspond to l!ai = 1 and 
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The latter condition defines the familiar Bloch sphere for 
spin-i systems. The Lagrangian Q in this case becomes 



C{e, A) = In 



.k=l 



where we have omitted the constant term originating 
from fli since it is independent of 6. 



2. Spin-1 systems 



For A'^ = 3, in addition to the constraint (fT2|) . we have 



3!a3 



(iV-l)(A^-2) 3(iV-2) 



7V2 



2N 



\o?+l^9^3k9^e,ek > 0, 

(13) 

where the structure constants gijk are components of the 
completely symmetric tensor of the Lie algebra su{N) 
(Eq. [T2|) . The Lagrangian ([4]) for spin-1 systems then 
becomes 



C{9, A) = In 



nTr(p(0)F,J 



Lfc=l 



iV- 1 1 



N 



A, 



(7V-l)(7V-2) 3(iV-2)|^|2 



iV2 



2N 



-9ijkt 



^fe - 73 



where we have (again) used the Einstein summa- 
tion convention for repeated indices, i.e., gijkQidjOk = 

Note that while for N — 2, the Bloch vector space is 
exactly a ball, the additional constraints starting with 
03 > restrict the Bloch vector space for iV = 3 and 
higher dimensions to a proper subset of a ball. Since the 
structure constants of su{N) for iV > 3 have no rota- 
tional invariance, neither do these conditions. The Bloch 
vector space has an asymmetric structure in for 

> 3. 

In the Bloch vector parameterization, the Fisher in- 
formation takes on a particularly simple analytical form. 
For = 2, we have for the score vectors 



dlnL{e\x) 

del 

dlTiL{e\x) 
dlTiL{9\x) 



fe=i 



k 

2Z.Tr(pF,J 



[^^,,(l,2)-t-^^,j2,l)] 



^|:^[^^.(M) + ^.(2,2)] 



with 



_ d\nL(e) ( 31nL(9) 

^ d0i 



The Fisher infor- 



mation decomposes similarly to = 2 for iV = 3 due to 
the linearity of the Born probability in 6. 



B. Mutually unbiased (average case optimal) 
measurements 



As discussed in section IIIB, quantum measurement 
strategies capable of fully reconstructing the density ma- 
trix can be characterized by a set of A^ -I- 1 measurement 
bases (matrices of eigenvectors) V^^\ < r < N . Each 
such choice yields a different asymptotic variance for the 
ML estimator, i.e., a different Fisher information matrix. 

In the present work, we focus on the use of mutually 
unbiased measurement bases. For 1-qubit systems (A^ = 
2), MUB bases F^''), < r < A^ can be written 

The observables are then simply the standard Pauli spin 
operators. 

For A^ = 3, or more generally when A^ is the power of 
an odd prime, these bases V^"^"^ are given by 



Jpq, 



r = 



^PQ^ - S exp[2^(rp2 +pq)l l<r<N. 



The orthonormal observables can then be taken to be 

= |^)(^| = dzag(0,...,l,...0), 

1 < i < A^- 1. 

Recall that the MUB measurements maximize the av- 
erage Fisher information over the set of all true density 
matrices poi and hence are "average-case optimal" as 
discussed in Section HID. 



C. Complete, average case suboptimal 
measurements 

In order to interrogate the asymptotic and finite sam- 
ple losses induced by using biased or suboptimal mea- 
surement bases, the MUB bases are rotated, causing the 
associated parallelipiped (Ti,-- - ,T;v+i) in Section III 
D to no longer be a rectangular solid and the average 
Fisher information to decrease. After rotation, the new 
bases can be written 



={/(s)l/M[/t(< 



(16) 



where U{s) = e* A being a random Hermitian 

matrix specifying a random axis of rotation in the A^- 
dimensional Hilbert space; s is a scalar parameter spec- 
ifying the extent of rotation (magnitude of the solid an- 
gle). To generate a set of measurement bases that is suf- 
ficiently different from the MUB, A^ measurement bases 
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were rotated according to the above formula; for basis 
r, the parameter s was incrementally increased until 

|(vr,vf))|>a4=, (17) 

where r' runs over all the other bases and a > 1 is a cho- 
sen scalar, for at least one pair of eigenvectors i,j from 
bases r and r' , respectively. If necessary, this procedure 
was iterated self-consistently. We refer to the resulting 
bases as mutually biased measurement bases (MBB). 

V. NUMERICAL IMPLEMENTATION 

In the Bloch vector parametrization, the maximum 
of the likelihood corresponding to Lagrangian function 
([4| can be found by solving for roots of the nonlinear 



system of A^^ + 2iV - 3 equations |§ in iV^ + 27V - 3 un- 
knowns 9i,"fj,Xk. The number of constraints and hence 
unknowns will differ in other parametrizations; for ex- 
ample, for parameterizations where positive semidefi.- 
niteness is implicit in the parametrization (such as the 
Cholesky parametrization), Xj =0, j = 1, • • • , iV — 1 
in equation ([4]). The Newton- Raphson algorithm can be 
used to find the roots of this nonlinear system. Writing 
^ = H(t), the Newton step for 

H(t) = 

is 

tncw — told ~t- St^ 

with St = — J^^H, where Jij = is the Jacobian 
matrix. Denoting the rows of H hy Hi, we have 



ClOi OUi 



i?^.+Ar+,_2(A, 7) = ^^^^.^'^'""^ = 2A,7, - K J < - 1. 



In order to faciliate global convergence of the Newton- 
Raphson algorithm, the "sum-of-squares" function h = 
H ■ H is evaluated after each iteration, and the step 
length progressively shortened until the value of this 
function is found to decrease (the existence of such a 
step length is guaranteed) [l7j . 

Alternatively, the "sum-of-squares" function h(t) may 
be minimized directly to locate the constrained maxi- 
mum of the likelihood. In general, direct minimization of 
this function may be prone to encountering local traps. 
In the present case, minimization using an optimization 
algorithm capable of escaping from traps was employed. 
The quasi-Newton algorithm, in which the algorithic 
step fc -f 1 is given by t^'^+i) - t^'') = -A^^ V/i(t('=)), 
where A~^ denotes the approximate inverse Hessian 
computed with the Broyden-Fletcher-Goldfarb-Shanno 
(BFGS) update, was first used to search for a zero of 
Vh{t) until convergence slowed "below a specified step- 
wise tolerance" , again using an adaptive line search 
strategy to identify the optimal step size. Traps were 
often encountered that could not be escaped from using 
the above technique. To surmount them, a fixed number 
of stochastic simulated annealing steps were applied. 

In order to have a scalar measure of the accuracy of the 
entire density matrix estimate, the Josza fidelity (gen- 
eralized overlap) J? = Tt^{\/ ^/pp^/p}, which is related 
to the statistical distance on the space of density ma- 



trices, was used. The convergence tolerance (i.e., objec- 
tive function value below which the optimized param- 
eter estimate was accepted as an estimator) for each p 
was chosen by running a set of ML optimizations with 
a very large sample size (m — 10000 observations), and 
determining the objective function value below which 
^ > 0.999 for all cases. In order to make the con- 
vergence tolerance compatible across difference sample 
sizes, the log of the unconstrained likelihood function 
was scaled as — InL (6* | x). 

Kernel density estimators (KDEs), nonparametric 
density estimators that avoid some of the deficiencies 
of histograms, were used to estimate finite sample prob- 
ability density functions. Unlike histograms, they are 
smooth. KDEs center a kernel function at each data 
point; the contribution of data point x{i) to the esti- 
mate at X* depends on x* — x{i). The estimated density 
takes the form 

if;, (£_£(!)) 

with J K{t)dt — 1. Bandwidth (h) optimization, which 
avoids values of h that lead to spiky estimates (under- 
smoothing) or oversmoothing, was based on minimiza- 
tion of the asymptotic mean integrated squared error 
(AMISE) [Hi, i.e., hopt = arg min AMISE. A Gaussian 



11 



= exp(--u ), 



kernel 



was used. 

Codes implementing the above algorithms, including 
both optimization and hypothesis testing, are available 
upon request from the author. 



IS = 500 
IS -1000 
IS = 1500 





VI. SIMULATION RESULTS 

A. Simulation Design 

We consider several Monte Carlo simulation environ- 
ments in order to assess the asymptotic and finite-sample 
properties of the maximum likelihood estimators of the 
density matrix and test statistics for various physical 
quantities of interest. For both spin-1/2 and spin-1 
systems, we report results for full rank, nondegener- 
ate mixed states. The true p's (see Tables I, II) are 
randomly chosen and, hence, the results may be con- 
sidered representative for any true underlying density 
matrix''. For each p, 1000 hypothetical samples^ of i.i.d 
quantum observations each of size^ m = 100,400,1000 
were simulated with the observations evenly distributed 
between the -I- 1 measurement bases. For simulat- 
ing quantum observations from a given basis the 
multinomial distribution probabilities {p^, ■ • • ,p]v) were 
computed as = Tr{p{6)F[^) where denotes the 
observable associated with the multinomial outcome i 
obtained in draw k from basis r. The parameters of 
p were then estimated using the maximum likelihood 
approach for each sample. Among the types of possi- 
ble optimal measurement strategies, mutually unbiased 
measurement bases (MUB) minimizes the (asymptotic) 
estimation error across the widest range of Hilbert space 
dimensions and true p's (as discussed Section III D). 
MUB's are therefore used for the simulations in Section 
II B, and the performance of suboptimal MBB bases are 
then compared to the former in Section C. 



B. Estimation Results for Optimal Measurements 

The primary goal in this subsection is to determine 
whether the limiting asymptotic normal distribution of 



"* We considered other choices of the true density matrix than the 

ones reported in this section and the results arc not sensitive to 

the choice of the true p 
^ See Section VI B for a discussion of the choice of the number of 

hypothetical samples 
® These choices of sample sizes enable us to assess the impact of 

the sample size on the properties of the parameter estimates for 

experimentally realistic scenarios. 



FIG. 1: Finite sample distributions of 9i, mixed spin-1/2 
system, MUB bases, for various numbers q of hypothetical 
simulated samples. (A) m=100; (B) m=400. In each panel, 
the finite sample distributions for q=100, 500, 1000, and 1500 
simulations are shown. 



the parameter estimates provides a good approximation 
to the finite-sample distribution when MUB's are used 
to generate quantum observations, and how the approx- 
imation improves with increase in the sample size. 

Note that frequcntist inferential methods are based 
on the notion of hypothetical repeated samples. Hence, 
an important practical consideration when assessing the 
finite-sample performance of these methods is the choice 
of the number of simulated samples. In particular, the 
number of samples should be large enough such that the 
finite-sample statistics adequately characterize the cor- 
responding population quantities and the finite-sample 
distributions are sufficiently smooth. To obtain such a 
choice. Figure 1 compares the finite-sample distributions 
for the ML estimator of 0i for the mixed spin-1/2 system 
for 100, 500, 1000, and 1500 simulated samples. Figure 
1 reveals that the finite-sample performance of the ML 
estimator is quite insensitive to the choice of the number 
of simulated samples. Note that the finite-sample pdf's 
for different numbers q of simulated samples are barely 
distinguishable. Therefore, in all subsequent analysis, 
we set the number of hypothetical samples to 1000. 

We next turn our attention to mixed spin-1/2 sys- 
tems {N = 2). Table II reports statistics from the 
asymptotic and finite-sample distributions of the ML es- 
timators of the parameters and diagonal elements of the 
density matrix. Panels A, B, and C report results for 
m — 100, 400, and 1000, respectively. In particular, we 
report the following statistics from the finite-sample dis- 
tribution: Bias, Standard Deviation (std). Root Mean 
Square Errors (RMSE), and 95% confidence intervals. 
These statistics broadly summarize a probability distri- 

bution and are defined as follows. Let <0-'> de- 

note the estimator of 9i in sample j. We have Bias(0i) = 

/ \ 1/2 

itn- 0^,o, std(0;) = Ut {si - j where 

^ = i E oi , RMSE(^o - U E {si - ^.,o) ' I = 
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Bias(^j 



1/2 



and the 95% confidence 



interval is the region bounded by the 2.5% and 97.5% 
quantilcs of the finite-sample distribution of 9i. 

Note that the diagonal elements of the density ma- 
trix are smooth real functions of the parameter vector 
9. Let Pij{9) denote the ij-th element of the density 
matrix. Given the Invariance property of the ML es- 
timator (see Section II), the maximum likelihood esti- 
mator of Pij{9) is pij{9). Also, the asymptotic distri- 
bution of the estimators of the diagonal elements of p 
can be obtained using the Continuous Mapping Theo- 
rem: ^Jrn{pij{9) — Piji9o)) has an asymptotic normal 
distribution with asymptotic variance given by 



var ip^J{9)) 



f dptj{9o) 
V d9o 



V d9a 



(18) 



where E is the asymptotic variance of 9. 

Row 1 of Panel A reports the true values of the param- 
eters and the diagonal elements of the density matrix. 
The row labeled "Asymptotic" in each panel reports the 
point estimates of the parameters and diagonal elements, 
along with the asymptotic standard errors in parenthe- 
ses and asymptotic 95% confidence intervals in square 
brackets. We estimate the asymptotic covariance ma- 
trix consistently using the observed Fisher information 
given by equation ([1]). As mentioned above, the asymp- 
totic distribution of the estimators of the diagonal ele- 
ments of p are obtained using the Continuous Mapping 
Theorem. For the computation of the asymptotic dis- 
tribution, 1 out of the 1000 hypothetical samples was 
selected randomly. 

Note that the asymptotic standard errors are very 
small, ranging from 0.004 for 9i to 0.02 for 02 and 03 
in Panel A. Consequently, the asymptotic 95% confi- 
dence intervals are tightly centered around the point es- 
timates. The asymptotic standard errors decrease with 
the increase in sample size to m = 400 and m = 1000 
in Panels B and C, respectively. Consequently, as pre- 
dicted by asymptotic theory, the distributions get nar- 
rower with increase in the sample size. However, note 
that, with the exceptions of 02 in Panel A, none of the 
asymptotic confidence intervals contain the true value of 
the parameter. 



The subsequent rows of each Panel report statistics 
from the finite-sample distribution of the parameter es- 
timates. The Table reveals that the finite-sample bi- 
ases are negligible, even for small sample sizes, rang- 
ing from -0.0055 for 03 to 0.0027 for p22 for m = 100 
in Panel A. However, the finite-sample standard errors 
reveals that the asymptotic standard errors grossly un- 
derestimate the estimation error in finite samples. The 
finite-sample standard errors vary from 0.09 for pn to 
0.19 for 02 for m = 100. These are about an order of 





— m = 100 




- m=m 




■ m = 1000 




— asy ^ ^ ' 





FIG. 2: Finite sample distributions of parameter estimates, 
mixed spin- 1/2 system, MUB bases. Each Panel reports re- 
sults for one parameter of the density matrix and superim- 
poses results from estimations using sample sizes 100, 400, 
and 1000. The finite sample distributions were computed 
from 1000 simulations. 



magnitude bigger than the corresponding asymptotic es- 
timates. Consequently, the finite-sample confidence in- 
tervals are substantially wider than the asymptotic ones 
and do contain the true value of the parameter. Thus, 
the asymptotic confidence intervals grossly undercover 
in finite samples, thereby rendering any inference based 
on the asymptotic distribution of the parameters unreli- 
able. While increase in the sample size to 400 and 1000 
in Panels B and C, respectively, reduces the finite sample 
standard errors, the rate of convergence is much slower 
than the theoretically predicted 

To further illustrate the discrepancy between the 
asymptotic and the finite-sample properties of the pa- 
rameter estimates and how the asymptotic approxima- 
tion improves with increase in the sample size, Figure 
2 plots the finite sample distributions of the parameter 
estimates for sample sizes m = 100, 400, and 1000 in 
the same graph. We also include in the graph the cor- 
responding asymptotic distributions which are degener- 
ate at the true values of the parameters. As expected, 
the finite-sample distributions get narrower and more 
concentrated around the true value with increase in the 
sample size. 

While Figure 2 focuses on the consistency property of 
the ML estimators, the quality of the asymptotic normal 
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Panel A. 


Sample size 


100 






ei 






Pii 


p22 


True Value 


-0.44 


-0.02 


0.19 


0.59 


0.41 


Asymptotic 


-0.45 

(0.004) 
[-0.46 -0.45] 


0.03 

(0.02) 
[-0.00 0-06] 


0.09 

(0.02) 
[0.06 0.12] 


0.55 

(0.01) 
[0.53 0.56] 


0.45 

(0.01) 
[0.44 0.47] 


Bias (xlO^) 


-0.40 


-0.14 


-0.55 


-0.27 


0.27 


Standard error 


0.16 


0.19 


0.17 


0.09 


0.09 


RMSE 


0.16 


0.19 


0.17 


0.09 


0.09 


95% CE 


-0.76 - 0.09] 


[-0.39 0.33] 


[-0.15 0.52 


[0.42 0.77] 


[0.23 0.58] 






Panel B: Sample size 


400 




Asymptotic 


-0.49 

(0.004) 
[-0.50 -0.48) 


92 

-0.14 

(0.004) 
[-0.15 -0.13] 


0.31 

(0.004) 
[0.30 0-32] 


Pii 
0.65 

(0.002) 
[0.65 0.66] 


p22 

0.35 

(0.002) 
[0.34 0-35] 


Bias (xlO^) 


-0.33 


-0.17 


0.24 


0.12 


-0.12 


Standard error 


0.08 


0.09 


0.09 


0.05 


0.05 


RMSE 


0.08 


0.09 


0.09 


0.05 


0.05 


95% CE 


-0.61 - 0.26] 


[-0.19 0.16] 


[0.02 0.37] 


[0.51 0.76] [0.24 0.49] 






Panel C: Sample size 


1000 




Asymptotic 


—0.48 

(0.002) 
[-0.48 -0.47] 


02 

-0.03 

(0.002) 
-0.042 -0.036] 


0.22 

(0.002) 
[0.216 0-222] 


Pii 
0.61 

(0.001) 
[0.608 0.611] 


p22 

0.39 

(0.001) 
[0.389 0.392] 


Bias (xlO^) 


-0.46 


0.00 


0.36 


0.18 


-0.18 


Standard error 


0.06 


0.07 


0.07 


0.04 


0.04 


RMSE 


0.06 


0.07 


0.07 


0.04 


0.04 


95% CE 


-0.56 - 0.35] 


[-0.14 0.09] 


[0.09 0.32] 


[0.54 0.77] 


[0.22 0.46] 



TABLE I: Finite sample distribution statistics (1000 repeated samples' 
MUB measurement bases. 



for state estimation of spin-1/2 quantum systems: 



approximation is considered in Figures 3 and 4. Figure 
3 plots the finite-sample and asymptotic distributions of 

^yrn (^§i — ■ Panels A-C plot the distributions for 0i 

for TO = 100,400, 1000, respectively, while Panels D-F 
do the same for 62- 

Panels A-C in Figure 4 plot asymptotic and finite- 
sample distributions of \/rn{p\\{9) — pn (0o)) for the up- 
per diagonal element of the density matrix, for to = 
100,400, and 1000, respectively. Note that the finite 
sample distributions are substantially wider than the 
corresponding asymptotic ones for all three sample sizes. 
In fact, the finite sample distributions get wider with in- 
crease in sample size, indicating that the finite sample 
standard errors decrease at a rate slower than ^/m. 

The analysis so far has been restricted to spin-1/2 
quantum systems. Most proposed applications of state 
estimation, including quantum information processing 
models, generally involve higher dimensional systems. In 
order to assess the impact of Hilbert space dimension on 
the performance of frequentist inference, we investigated 
the state estimation of spin-1 systems. Table III reports 
the asymptotic and finite-sample behavior of the ML 
estimators of the parameters and diagonal elements of 
the density matrix for a spin-1 quantum system. As in 
Table II, Panels A, B, and C report results for to = 100, 
400, and 1000, respectively. 

Note that the divergence between the asymptotic and 
finite-sample performance of the estimators is more pro- 
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FIG. 3: Finite sample distributions of y'rn{9i — 9i.o), mixed 
spin-1/2 state, MUB bases. Panels A-C: Oi. (A) m = 100; 
(B) m = 400; (C) m = 1000. Panels D-F: 82. (D) m = 100; 
(E) m — 400; (F) m — 1000. In each panel, the finite sample 
distributions (1000 simulations) are shown alongside the cor- 
responding asymptotic distributions. Asymptotic variances 
are estimated from one of the 1000 repeated samples (chosen 
at random). 
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— sampled |m = 400 ) 




t 


- asymptotic 






1 







-4-2 2 

M( p„{e)-p„(6,) ) 



Panel C 





— sampled (in = 1000) 

- asvmDtotic 




V 6 ■ 










Q. 













-4 -2 2 4 

M Pii(e)-Pii(6o) ) 



FIG. 4: Finite sample distributions of y/rn{pii{0) — pii(6'o)), 
for mixed spin-1/2 state, MUB bases. (A) m = 100; (B) 
m = 400; (C) m = 1000. In each panel, finite sample dis- 
tributions (1000 simulations) are shown alongside the corre- 
sponding asymptotic distribution. 

nounced for the spin-1 system, potentially because of the 
increase in the number of parameters and nonlinearity 
of the model. The finite-sample biases are large for some 
of the parameters including 0.05 for 6*8, 0.03 for pn, and 
—0.03 for for m — 100. However, the biases for the 
above parameters are reduced by an order of magnitude 
with increase in sample size and, hence, are negligible 
for larger sample sizes. 

The finite-sample standard errors are also substantial 
for m = 100, ranging from 0.15 for 9i to 0.17 for 6*8. 
Moreover, these vary from being about 8 to 9 times big- 
ger than the corresponding asymptotic standard errors. 
The small sample standard errors decrease with increase 
in the sample size - the standard errors in Panel B are 
about 1/2 and those in Panel C about 1/3 of those in 
Panel A. This is also shown in Figure 5, where each panel 
plots the finite sample distribution of the ML estimator 
of one parameter for sample sizes m = 100, 400, and 
1000 in the same graph. 

However, as in the case of spin-1/2 systems, the dis- 
crepancy between the asymptotic and finite-sample stan- 
dard errors does not diminish with sample size. Al- 
though the finite-sample standard errors decrease with 
sample size and, consequently, the confidence intervals 
get shorter, the rate of convergence is much slower 
than the theoretically predicted rate \/m. This is 



also revealed in Figures 6 and 7, which plot, respec- 
tively, the finite-sample and asymptotic distributions of 

y/m{9i - Oi^i^ and ^/Tn{pu{9) - Pii{Oo))- Panels A-C 

in Figure 6 (7) plot the distributions for 6*7 {pn) for 
m = 100,400, and 1000, respectively, while Panels D-F 
report the same for 9g {p22)- Note that the finite-sample 
distribution gets wider with increase in the sample size. 
This trend continues even for very large sample sizes. 
For example, for sample sizes 10000 and 20000 in the 
spin-1/2 system under consideration, the finite sample 
standard errors for 9i were 0.017 and 0.0092, respec- 
tively, whereas representative asymptotic standard er- 
rors were 1.5 x 10"* and 7.8 x 10"^ - two orders of mag- 
nitude smaller. For very large sample sizes, computa- 
tional likelihood maximization can become prohibitively 
expensive even if experimental data collection is not. 

Another serious shortcoming of frequentist methods, 
which unlike Bayesian methods are based on constrained 
optimization, is that local traps in the likelihood land- 
scape can result in the optimization algorithm converg- 
ing to points in the parameter space where all the con- 
straints on the density matrix are not precisely satisfied; 
in the Bloch vector parametrization, this phenomenon 
can result in negative eigenvalues. Paramcterizations 
that automatically satisfy the positive semidefiniteness 
constraints (such as the Cholesky parametrization) do 
not have this problem, but may not satisfy the unit 
trace condition. In the likelihood optimizations carried 
out here, because of the nonlinear nature of the con- 
straints, the parameter estimates did produce negative 
eigenvalues; the problem was most pronounced for p's 
with eigenvalues near or 1 and small sample sizes. 
Because the local minima can be far from the global 
optimum, this can result in multimodality of the esti- 
mate distributions in addition to unphysical parameter 
estimates. This is shown in Figure 8, which plots the 
raw distributional results obtained from the optimiza- 
tion procedure, without elimination of unphysical p es- 
timates that have negative eigenvalues. The Figure also 
shows how the standard errors of the raw estimate dis- 
tributions are increased by the presence of unphysical 
estimates. In certain cases, restarting the optimization 
from a different initial guess can circumvent local op- 
tima, but this method is not always successful. In the 
present work, we filtered the estimate distributions to 
exclude estimations producing negative eigenvalues orig- 
inating from the optimization algorithm being trapped, 
since the resulting multimodality masks the true perfor- 
mance of the estimator. Note that Bayesian methods, 
which depend on conditional simulation, do not produce 
unphysical parameter estimates. 

C. Effect of measurement bases 

The above results were all obtained for simulated sam- 
ples employing MUB. As noted in Section III D, the 
MUB set maximizes the average Fisher information, or. 
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FIG. 5: Finite sample distributions of parameter estimates, spin-1 system, MUB bases. Each Panel reports results for one 
parameter of the density matrix and superimposes results from estimations using sample sizes 100, 400, and 1000. The finite 
sample distributions were computed from 1000 simulations. 
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FIG. 6: Finite sample distributions of ^pmiQi — 6i,o), mixed 
spin-1 state, MUB bases. Panels A-C: 67. (A) m = 100; (B) 
m = 400; (C) m = 1000. Panels D-F: 6*8 • (D) m = 100; 
(E) m = 400; (F) m = 1000. In each panel, the finite sample 
distributions (1000 simulations) are shown alongside the cor- 
responding asymptotic distributions. Asymptotic variances 
are estimated from one of the 1000 repeated samples (chosen 
at random). 



FIG. 7: Finite sample distributions of \/m(P"(^) ~ paiPo)), 
for mixed spin-1 state, MUB bases. Panels A-C, pn. (A) 
m = 100; (B) m = 400; (C) m = 1000. Panels D-F, p22. (D) 
m = 100; (E) m = 400; (F) m = 1000. In each panel, finite 
sample distributions (1000 simulations) are shown alongside 
the corresponding asymptotic distribution. 
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FIG. 8: Unfiltered finite sample distributions of parameter estimates, spin-1 system, MUB bases. Eacli Panel reports results 
for one parameter of the density matrix and superimposes results from estimations using sample sizes 100, 400, and 1000. 
The finite sample distributions were computed from 1000 simulations. Unlike the other Figures, physically inadmissible p 
estimates with negative eigenvalues were not removed from the finite sample distributions. 



equivalently, minimizes the asymptotic covariance ma- 
trix of the estimators, over the set of all possible density 
matrices. However, implementing MUB measurements 
in the laboratory is quite difficult for > 2. 

In order for all the parameters of the density matrix 
to be identifiable by frequentist inference, measurements 
in at least iV + 1 bases are required. Most laboratory 
setups use > + 1 redundant bases and are suboptimal 
from the standpoint of average-case Fisher information. 
Hence, it is crucial to determine whether suboptimal 
measurement bases achieve similar estimation accuracies 
in finite samples. In this work, suboptimal measurement 
bases were generated according to the method described 
in Section IV C, and are reported in the Appendix; a in 
equation P7|) was set to 1.2. The primary goal of this 
subsection is to compare the asymptotic and finite sam- 
ple relative efficiencies across optimal and suboptimal 
measurement strategies. 

Table HI reports the asymptotic and finite-sample 
performance of the ML estimators using the subopti- 
mal bases for A^ = 3. Panels A, B, and C report results 
for m = 100, 400, and 1000, respectively. We begin 
with a comparison of the asymptotic relative efficien- 
cies of MUB and MBB bases. The asymptotic stan- 
dard errors vary from 0.01 for p22 and p^z to 0.04 for 9^ 
in Panel A. Comparing with the results obtained with 
MUB in Table II reveals that the asymptotic standard 



errors for the suboptimal measurement strategy are big- 
ger for most parameters than those for the MUB, as 
expected based on the asymptotic theory. The asymp- 
totic standard errors decrease and, hence, the confidence 
intervals become shorter, with the increase in sample 
size. We next turn to a comparison of the relative finite- 
sample efficiencies of frequentist estimators using these 
measurement strategies. Table HI reveals that the finite- 
sample standard errors for the parameters and diago- 
nal elements of the density matrix are generally big- 
ger for the suboptimal measurement strategy relative to 
the MUB, especially for larger sample sizes. The finite- 
sample standard errors range from 0.07 for p22 to 0.26 
for 6q for the suboptimal strategy, compared to the vari- 
ation from 0.08 for pn to 0.17 for 6*8 for the MUB for 
m = 100. While increase in the sample size reduces the 
finite sample standard errors, they still remain large, 
ranging from 0.04 to 0.31 for m 400 and from 0.03 to 
0.29 for m = 1000 - bigger than the corresponding values 
obtained with the MUB. Consequently, the finite-sample 
confidence intervals are wider for the suboptimal bases 
relative to the MUB for all three sample sizes. Thus, 
the ML estimators using MUB also have superior finite- 
samples properties than those using suboptimal bases, 
although the difference can be marginal for small sam- 
ple sizes. Note that while the rate of convergence to 
the asymptotic limit is smaller than the theoretically 
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Panel A Panel B panel A Panel B 




FIG. 9: Distributions of y/rn{6i — 9ifi) for a spin-1 system, 
sample size 100: comparison of MUB and MBB bases. (A) 
\/rn{6T~6Tfi); (B) \/rn{9s — 9s,o). Panels C and D: magnified 
comparison of asymptotic variances obtained for MUB and 
MBB bases in A and B, respectively. MBB bases used are 
listed in Appendix A. 



FIG. 10: Distributions of ^Jra(Qi — Qifi) for a spin-1 system, 
sample size 1000: comparison of MUB and MBB bases. (A) 
\frn{Q'j — Qt (B) \/rn{0s — 0s,o). Panels C and D: magnified 
comparison of asymptotic variances obtained for MUB and 
MBB bases in A and B, respectively. MBB bases used are 
listed in Appendix A. 



predicted ^/rn in both cases, it is somewhat greater for 
MUB than for the suboptimal bases. 

Comparing the relative asymptotic efficiencies of 
the MUB versus suboptimal measurement strategies 

(REi:2 — with the corresponding relative finite- 
sample eflaciencies, we find that for small sample sizes, 
the former are larger for most of the parameters and di- 
agonal elements of the density matrix. For bigger sam- 
ple sizes, due to the greater rate of convergence of the 
efficiency of MUB bases to the asymptotic limit, the fi- 
nite sample RE's are sometimes slightly larger than the 
asymptotic RE's. These results suggest that although 
frequentist inferential methods using MUB have theoret- 
ically predicted superior performance compared to sub- 
optimal measurement strategies, the relative improve- 
ment is strongly dependent on the sample size and is 
much less pronounced in finite-samples. This feature 
was also observed for measurement bases that differed 
more substantially from MUB, such as randomly gen- 
erated bases (data not shown). The theoretically pre- 
dicted advantages of the optimal measurement strategy 
can be dramatically diminished for experimentally re- 
alistic sample sizes - raising doubts about the practical 
utility of these types of measurements, which are diffi- 
cult to implement experimentally for Hilbert space di- 
mensions > 2, in frequentist inference. 

Figures 11 and 12 compare asymptotic and finite- 
sample properties of the optimal MUB and the subopti- 
mal bases. 



D. Hypothesis Testing 

An important application of quantum state estima- 
tion is optimal decision making and control, and by ex- 
tension, quantum information processing. Control logic 
is often Boolean, in that decisions are made based on 
whether a hypothesis is true or false, rather than the pre- 
cise state of the system. Therefore, in this subsection, 
we examine the finite-sample performance of maximum 
likelihood-based hypothesis testing procedures that aid 
in making decisions. 

Hypothesis testing problems may involve a single re- 
striction or multiple restrictions on the parameters. We 
consider first the case of single restrictions. The ap- 
propriate hypothesis test for testing a single restriction 
is the t-test. The t-test statistic for the null hypoth- 
esis i?o : f{(^) — c against the alternative hypothesis 
Hi '■ f(0)^ c, where c is a known constant, is 



t 



1 



(19) 



where a{f{9)) is a consistent estimate of the asymptotic 
standard error of f{9)- Asymptotically, under the null 
hypothesis, the ^-statistic converges in distribution to 
the standard normal distribution; thus, for a two-sided 
size a = 0.05 test, the null hypothesis is rejected if \t\ > 
1.96. 



Panel A: Sample size 100 



True Value 


6*1 
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02 

-0.14 


Os 

-0.07 


Oa 
-0.04 


0,, 

-0.15 


^6 

-0.01 


07 

-0.17 


08 

-0.23 


Pii 
0.23 


p22 

0.30 


P33 
0.46 


Asymptotic 


0.24 

(0.02) 
[0.21 0.27] 


-0.05 

(0.02) 
[-0.08 -0.02] 


-0.44 

(0.01) 

[-0.46 -0.42] 


-0.04 

(0.02) 
[-0.07 -0.01] 


-0.02 

(0.02) 
[-0.05 0.01] 


-0.12 

(0.02) 
[-0.15 -0.09] [-0 


-0.25 

(0.02) 
.29 -0.22] 


-0.25 

(0.02) 
[-0.29 -0,22] 


0.04 

(0.004) 
[0,03 0,05] 


0.48 

(0.01) 
[0,46 0,50] 


0.48 

(0.01) 
[0,46 0,50] 


Bias (xlO^) 


-0.55 


2.20 


1.90 


0.89 


-0.22 


-0.69 


1.43 


5.35 


2.50 


0.59 


-3.09 


Standard error 


0.15 


0.15 


0.14 


0.15 


0.15 


0.16 


0.16 


0.17 


0.08 


0.09 


0.10 


RMSE 


0.15 


0.15 


0.14 


0.15 


0.15 


0.16 


0.16 


0.17 


0.09 


0.09 


0.10 


95% CE 


[-0.19 0.44] 


[-0.39 0.18] 


[-0.36 0.20] 


[-0.31 0.29] 


[-0.46 0.12] 


[-0.33 0.28] [- 


0.46 0.16] 


[-0.47 0.16] 


[-0.14 0.51] 


[0.08 0.82] 


[0.17 0.96] 












Panel B: Sample size 4OO 












Asymptotic 


Oi 

0.17 

(0.004) 
[0.16 0.18] 


O2 

-0.22 

(0.004) 
[-0.22 -0.21] 


^3 

-0.10 

(0.003) 

[-0.104 -0.091] 


Oi 
0.11 

(0.004) 
[0.10 0.12] 


-0.07 

(0.004) 
[-0.08 -0.07] 


-0.08 

(0.004) 
[-0.09 -0.08] [-0 


07 

-0.14 

(0.004) 
.16 -0.14] 


08 
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(0.004) 
[-0.28 -0.26] 


Pll 
0.21 

(0.002) 
[0,20 0,21] 


p22 

0.31 

(0.002) 
[0,30 0,31] 


P33 

0.49 

(0.003) 
[0,48 0,49] 


Bias (xlO^) 


-0.09 


0.32 


-0.31 


-0.32 


0.47 


0.00 


0.26 


0.35 


-0.05 


0.26 


-0.20 


Standard error 


0.08 


0.08 


0.08 


0.08 


0.08 


0.08 


0.09 


0.09 


0.04 


0.05 


0.05 


RMSE 


0.08 


0.08 


0.08 


0.08 


0.08 


0.08 


0.09 


0.09 


0.04 


0.05 


0.05 


95% CE 


[-0.02 0.32] 


[-0.30 0.03] 


[-0.23 0.07] 


[-0.21 0.13] 


[-0.30 0.01] 


[-0.16 0.14] [-0 


.34 - 0.00] 


[-0.43 - 0.05] 


[0.07 0.31] 


[0.11 0.44] 


[0.34 0.79] 












Panel C: Sample size 1000 












Asymptotic 
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(0.002) 
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(0.001) 
[0,178 0,181] 
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0.36 

(0.001) 
[0,36 0,37] 


P33 

0.46 

(0.001) 
[0,45 0,46] 


Bias (xlO^) 


0.04 


-0.31 


-0.01 


-0.16 


-0.32 


0.07 


-0.25 


-0.21 


-0.06 


-0.06 


0.12 


Standard error 


0.06 


0.06 


0.05 


0.05 


0.06 


0.05 


0.06 


0.06 


0.03 


0.03 


0.04 


RMSE 


0.06 


0.06 


0.05 


0.05 


0.06 


0.05 


0.06 


0.06 


0.03 


0.03 


0.04 


95% CE 


_U.(J 1 (J, 27] 




, u.ii; u,ij2| 


i U. l."> U.(ji,_ 




(J, 11 U. 11J_ i u 






[(J, 11 (j,2:tj 


_u,ii, (j,:;(>J 


_U,39 0,70] 



TABLE II: Finite sample distribution statistics (1000 repeated samples) for state estimation of spin-1 quantum systems: MUB measurement bases. 



Panel A: Sample size 100 
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True Value 


0.15 


-0.14 


-0.07 


-0.04 


-0.15 


-0.01 


-0.17 


-0.23 


0.23 


0.30 


0.46 


Asymptotic 


0.11 

(0.03) 
[0.05 0.18] 


0.28 

(0.03) 
[0.21 0.34] 


0.15 

(0.02) 
[0.12 0.18] 


-0.10 

(0.02) 
[-0.14 -0.07] 
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(0.02) 
[0.17 0.25] 


0.15 

(0.04) 
[0.07 0.22] 


-0.21 

(0.03) 
[-0.27 -0.15] 


-0.46 

(0.03) 
[-0.52 -0.41] 


0.27 

(0.02) 
[0.24 0.31] 


0.13 

(0.01) 
[0.11 0.14] 


0.60 

(0.01) 
[0.57 0.63] 


Bias (xlO^) 


-13.5 


23.9 


6.41 


3.31 


11.6 


23.9 


4.98 


8.27 


5.59 


-0.82 


-4.77 


Standard error 


0.19 


0.24 


0.13 


0.16 


0.22 


0.26 


0.20 


0.19 


0.10 


0.07 


0.11 


RMSE 


0.23 


0.34 


0.14 


0.16 


0.25 


0.35 


0.20 


0.20 


0.11 


0.07 


0.12 


95% CE 


[-0.37 0.35] 


[-0.42 0.49] 


[-0.24 0.25] 


[-0.39 0.27] 


[-0.55 0.37] 


[-0.25 0.66] 


[-0.48 0.24] 


[-0.52 0.21] 


[-0.02 0.59] 


[0.05 0.54] 


[0.09 0.74] 


Panel B: Sample size 4OO 




Oi 


02 


03 


04 


05 


06 


07 


0s 


Pll 


P22 


P33 


Asymptotic 


-0.03 

(0.006) 
[-0.05 -0.02] 


0.37 

(0.005) 
[0.36 0.38] 


-0.02 

(0.003) 
[-0.02 -0.01] 


0.11 

(0.004) 
[0.10 0.12] 


0.001 

(0.004) 
[-0.007 0.010] 


0.58 

(0.003) 
[0.57 0.58] 


-0.18 

(0.009) 
[-0.19 -0.16] 


-0.12 

(0.007) 
[-0.14 -0.11] 


0.29 

(0.005) 
[0.28 0.30] 


0.31 

(0.002) 
[0.30 0.31] 


0.41 

(0.003) 
[0.40 0.41] 


Bias (xlO^) 


-12.1 


20.2 


3.31 


-0.37 


10.3 


23.9 


0.35 


4.22 


2.87 


-0.43 


-2.44 


Standard error 


0.16 


0.24 


0.09 


0.11 


0.11 


0.31 


0.13 


0.11 


0.07 


0.04 


0.07 


RMSE 


0.20 


0.32 


0.10 


0.11 


0.22 


0.39 


0.13 


0.12 


0.07 


0.04 


0.07 


95% CE 


[-0.28 0.34] 


[-0.38 0.43] 


[-0.20 0.14] 


[-0.24 0.16] 


[-0.46 0.29] 


[-0.36 0.65] 


[-0.43 0.05] 


[-0.43 0.03] 


[0.06 0.43] 


[0.17 0.39] 


[0.32 0.65] 












Panel C: Sample size 


1000 












Oi 


O2 


03 


0A 


05 


06 


07 




pll 


p22 


P33 


Asymptotic 


0.09 

(0.002) 
[0.09 0.10] 


-0.26 

(0.001) 
[-0.27 -0.26] 


-0.09 

(0.001) 
[-0.092 -0.087] 


-0.10 

(0.001) 
[-0.100 -0.095] 


-0.13 

(0.003) 
[-0.13 -0.12] 


0.08 

(0.003) 
[0.076 0.089] 


-0.21 

(0.003) 
[-0.22 -0.21] 


-0.26 

(0.003) 
[-0.26 -0.25] 


0.21 

(0.002) 

[0.21 0.22] 


0.30 

(0.001) 
[0.30 0.31] 


0.48 

(0.001) 
[0.480 0.483] 


Bias (xlO^) 


-7.87 


14.7 


1.48 


-1.58 


8.48 


19.2 


-1.32 


2.56 


1.48 


-0.00 


-1.48 


Standard error 


0.13 


0.21 


0.06 


0.07 


0.16 


0.29 


0.09 


0.08 


0.05 


0.03 


0.04 


RMSE 


0.15 


0.26 


0.06 


0.07 


0.18 


0.35 


0.09 


0.08 


0.05 


0.03 


0.05 


95% CE 


[-0.19 0.31] 


[-0.35 0.39] 


[-0.16 0.05] 


[-0.19 0.10] 


[-0.37 0.24] 


[-0.33 0.63] 


[-0.37 - 0.01] 


[-0.36 - 0.06] 


[0.11 0.34] 


[0.26 0.36] 


[0.36 0.58] 



TABLE III: Finite sample distribution statistics (1000 repeated samples) for state estimation of spin-1 quantum systems: MBB measurement bases. See Appendix 
for measurement bases. 
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Panel A: Sample size 100 



t-stat 



Wald stat 



size power crit val size power crit val 

Hypothesis 

92 = 02,0 0.87 [-37.8 27.7] 

92 = 0.5 0.97 [-117.3 - 38.0] 

9i = 9i,o, i = 1,2,3 0.98 47.5 

0.68 21.4 

Panel B: Sample size 400 



t-stat 



Wald stat 



Hypothesis 

92 = 92,0 

92 = 0.5 

9i = 9i,o, i = 1,2,3 



size power 
0.93 



crit val 



size power 



[-50.2 34.7] 
1.0 [-171.3 -84.0] 



1.0 



0.99 



crit val 



54.1 
31.6 



Panel C: Sample size 1000 



t-stat 



Wald stat 



Hypothesis 

92 = 6*2.0 

92 = 0.5 

9i = 9i,o, i= 1,2,3 



size power 
0.96 

1.0 



crit val 

[-66.1 66.4] 
-363.6 - 229.6] 



size power 



1.0 



1.0 



crit val 



85.7 
53.6 



TABLE IV: Finite sample test statistic size, power, and critical values for state estimation of spin-1/2 quantum systems (1000 
repeated samples). 



Panel A: Sample size 100 



t-stat 



Wald stat 



Hypothesis 
9(i = 6*6.0 
6*6 = 0.5 
Si = Si,o 
5i = 0.5 
9i = 9i,o, i ■ 
9i = 0.5, i -- 



size power 
0.89 



0.95 



0.99 



0.98 



3,4,6 



crit val 

[-24.6 12.1] 
-62.3 - 24.8] 
[-19.5 7.0] 
[-1.5 14.5] 



size power crit val 



0.0 



2.22 



0.0 



Panel B: Sample size 400 



t-stat 



Wald stat 



Hypothesis 
9(i = 6^6.0 
6*6 = 0.5 
Si = Si,o 
Si = 0.5 
9i = 9i,o, i 
9i = 0.5, i = 



size power 
0.90 

1.0 

0.92 

1.0 



3,4,6 



crit val 

[-39.7 37.4] 
-193.0 - 103.9] 
[-21.5 35.6] 
[-12.3 47.8] 



size power crit val 



0.0 



4.48 



0.0 



Panel C: Sample size 1000 



t-stat 



Wald stat 



Hypothesis 

96 = 6»6,0 

6*6 = 0.5 
Si = 5i,o 
Si = 0.5 
9i = 9i,o, i 
9i = 0.5, i 



size power 
0.96 



crit val 



size power crit val 



= 3,4,6 



[-81.5 59.4] 
1.0 [-453.6 - 304.2] 
0.97 [-81.5 52.3] 

1.0 [-26.4 81.7] 



0.13 



0.14 



11.3 
11.4 



TABLE V: Finite sample test statistic size, power for state estimation of spin-1 quantum systems (1000 repeated samples). 
Si denotes the first eigenvalue of p{9). The true eigenvalues of p were Si,o = 0.55, S2,o = 0.30, ^3,0 = 0.15. 
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The finite-sample size and power of tlie f-test vis-a-vis 
their corresponding asymptotic values provide a means 
of interrogating the finite sample performance of quan- 
tum state ML inferential methods. Recall from Section 
II A that the size of a hypothesis test is the probabil- 
ity of rejecting the null hypothesis given that it is true. 
Therefore, the finite sample size of the two-sided i-test 
for the hypothesis testing problem - Hq : f{0) — /{Oq) 
against the alternative hypothesis Hi : f{9) ^ f{0o) - is 
given by the fraction of times t = -r^jj^{f{0) — f{do)) 
is greater than 1.96 in absolute value; the t-test has an 
asymptotic size of 0.05. 

Similarly, as defined in Section II A, the power of a 
hypothesis test is the probability of rejecting the null hy- 
pothesis given that it is false. Hence, the finite-sample 
power for the hypothesis testing problem - Ho : f{9) = 
c against the alternative hypothesis Hi : f[9)^ c, 
where c 7^ /(^o) - is given by the fraction of times 

1 — -rrTTT^ifiQ) ~ c) is greater than 1.96 in absolute 
value; the <-test has an asymptotic power of 1. It can 
be shown [l^ that the <-test based on ML estimates is 
UMP; it is therefore ideal for interrogating finite sample 
performance of frequentist hypothesis testing. 

We consider single hypothesis testing problems involv- 
ing restrictions on the parameters Q of the density ma- 
trix. In particular, for the spin-1/2 system, we test the 
hypothesis Hq : Qi — 62,0 against the alternative Hi : 

02 7^ ^2,0- For the spin-1 system, the hypothesis tested 
is Hq : — 9q,o against the alternative Hi : ^ Oq q. 
These hypotheses provide information about the finite- 
sample size of the testing procedure. For the finite- 
sample power, we test the hypothesis Hq : O2 = 0.5 
against the alternative Hi : 62 ^ 0.5 for the spin-1/2 
system and the hypothesis Hq : — 0.5 against the al- 
ternative Hi : 9q 0.5 for the spin-1 system. Note that 
the true values of both 6*2 for the spin-1/2 system and 
9e for the spin-1 system are sufficiently different from 
0.5 to ensure a proper size and power comparison of the 
tests. 

Tables IV and V report the finite-sample size and 
power of the i-tests for the above hypotheses for the 
mixed spin-1/2 and spin-1 systems, respectively. Panels 
A, B, and C in each Table report results for m = 100, 
400, and 1000, respectively. Consider first Table IV. The 
finite-sample size of the t-test is 0.87 for m = 100, i.e. 
the probability of rejecting the null, Hq : 62 = ^2,0, given 
that it is true is 87%. This is more than 17 times big- 
ger than the theoretically predicted asymptotic value of 
0.05. This is a reflection of the fact that the asymptotic 
standard errors grossly underestimate the estimation er- 
ror in finite-samples. The asymptotic standard error for 
62 is about 0.02 - an order of magnitude smaller than the 
finite-sample value 0.19 (see Table II, Panel A). Conse- 
quently, the 2.5% and 97.5% quantiles from the finite- 
sample distribution of the t-statistic are —37.8 and 27.7, 
respectively, whereas the corresponding asymptotic val- 
ues are only —1.96 and 1.96, respectively. 



The increase in sample size to 400 and 1000 does 
not improve the size of the test. In fact, the finite- 
sample distribution of the ^-statistic widens relative to 
the asymptotic standard normal distribution. The 2.5% 
and 97.5% quantiles from the finite-sample distribution 
of the t-statistic are —50.2 and 34.7, respectively, for 
m = 400 and —66.1 and 66.4, respectively, for m = 1000. 
This is because, although the finite-sample standard er- 
rors decrease with the increase in the sample size, the 
rate of convergence is much slower than the theoretically 
predicted rate ^/m. The asymptotic standard errors for 
6*2 for m = 400 and m = 1000 relative to m = 100 are 0.2 
and 0.1, respectively. However, the finite-sample stan- 
dard errors for 62 for m — 400 and m = 1000 relative 
to m = 100 are much higher at 0.5 and 0.4, respectively 
(see Table II). This is also shown in Panel A of Figure 
11 that plots the finite sample ^-statistic distribution for 
different sample sizes, along with the asymptotic stan- 
dard normal distribution. Similar results are obtained 
for the spin-1 system in Table V and Panel A of Figure 
12. 

The power of the t-test is 0.98 for m — 100, close 
to the asymptotic value of 1. For higher sample sizes, 
both the finite sample and asymptotic powers coincide 
at 1. However, note that the finite sample size (above) 
is also 1 in spite of 0.5 being substantially different 
from 6*2,0 = 0.02, indicating that the testing procedure is 
incapable of distinguishing between the two hypotheses. 
Similar results are obtained for the spin-1 system (Table 
V). _ 

In certain applications, such as quantum informa- 
tion processing, it is important to simultaneously test 
whether several elements or parameters of the density 
matrix have prescribed values. For these testing prob- 
lems involving multiple parameter restrictions, we rely 
on the Wald test. As an example, for the spin-1/2 sys- 
tem, we test the hypothesis that each of the parame- 
ters 9i, 1 < i < 3 is equal to its true known value: 
Hq : = 6q against the alternative Hi : 9 ^ 9ifi. The 
Wald statistic is given by 

W = v^E-^v, (20) 

where v = (^1 — 9ifl, • • • , ^3 — 9z.o) and E is the es- 
timated asymptotic covariance matrix of the parame- 
ter estimates. Asymptotically, under the null, the Wald 
statistic converges in distribution to a chi-squared ran- 
dom variable , with the number of degrees of freedom 
k equal to the number of parameter restrictions (in this 
case 3). A size a — 0.05 Wald test has a rejection region 
corresponding to the tail of the x\ distribution beyond 
which the cumulative probability density is 0.05. For 
fc = 3, the null hypothesis is rejected iiW > 7.81. 

The Wald test at significance level a = 0.05 has 
asymptotic size 0.05 and power 1. The finite sample 
size of the test for the hypothesis testing problem - 
Hq : 9i = 9iQ, i ~ I,-- - ,3 against the alterna- 
tive hypothesis Hi : 9i ^ 9ifi, i = I,-- - ,3 - is 
given by the fraction of times W — v-'^E^^v where 
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V = (^'i — ^i,Oi ■ ■ ■ 1^3 ~ ^3,o); is greater than 7.81. 
The finite-sample power for the hypothesis testing prob- 
lem - Hq : 6 — c against the ahernative hypothesis 
Hi : 6 ^ c, where c ^ 9q - is given by the fraction of 
times W — v-^S~^v, where v = (^i — ci, • • • , 6*3 — C3), 
is greater than 7.81. The Wald test based on ML esti- 
mates is asymptotically equivalent to a likelihood ratio 
test, and hence is UMP (see Section II for details). 

The finite-sample size and power of the Wald test 
for the above hypothesis for different sample sizes is re- 
ported in Table IV, for the spin-1/2 system. The finite- 
sample size of the Wald test for m — 100 is 0.98, almost 
20 times bigger than the asymptotic value of 0.05. The 
95% quantile from the finite-sample distribution of the 
Wald-statistic is 47.5, whereas the corresponding asymp- 
totic value is only 7.81. As with the i-test, the increase 
in sample size to 400 and 1000 does not improve the size 
of the Wald test. The finite-sample distribution of the 
Wald- statistic shifts further away from the asymptotic 
chi-squared distribution with increase in sample size. 
This is also revealed in Panel B of Figure 1 1 that plots 
the finite sample Wald-statistic distribution for differ- 
ent sample sizes, along with the asymptotic chi-squared 
distribution. Similar results are obtained for the spin-1 
system in Table V and Panel B of Figure 12, although 
the power of the test improves for m = 1000. The finite 
sample performance of the Wald statistic is also poor for 
the spin-1 system as reported in Table V and Panel B 
of Fig. 12. 

Finally, another important conjecture to test regard- 
ing the density matrix is whether an eigenvalue of p has a 
given value. Hence, we also test the hypothesis Hq that 
an eigenvalue of p equals its true (known) value. We de- 
note the i-th eigenvalue of the estimated density matrix 
p{9) by Si{6). The Hellmann-Feynman theorem can be 
used to compute ^g^- Denoting the eigenvectors of p 
by Xi, we have: 



d9 



9x, 



9xi 



+ uxi + dp 
a^Px,+x,p— +x,-x. 



80 



f c- i^Xi tdp 

(5j— f Xj -t- 6^■x.\ — + X — 



do 



do 



fdp 



since (xi|x.,) = 1, ^(x^|xi) = -^(1) = 0. 

For spin-1/2 systems, the exact analytical solution for 
the eigenvalue derivatives is available directly since the 
characteristic polynomial is a quadratic: 



r22 + ni + ^^Z^- 
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2riir22 + 4r22 + rf^ + i^^ 



Si 



S2 = ^22 + rii - - ^ - 2riir22 + 4rf 2 



11 ^ '12 



where = Re (pij), 



Im (pij). However, it is dif- 



ficult to analytically compute asymptotic standard er- 
rors for the estimated eigenvalues of higher-dimensional 



quantum states by using the Hellmann-Feynman theo- 
rem since the characteristic polynomial is of higher or- 
der. Here, eigenvalue test statistics, along with the size 
and power of the associated tests, were computed for 
the spin-1 system under consideration (Table V), be- 
cause they are representative of the higher-order non- 
linearity common in applied problems. Note that hy- 
pothesis tests involving a single eigenvalue are t-tests. 
The finite sample performance of the eigenvalue tests 
are roughly identical to those of the above parameter 
tests, indicating that decisions (such as control logic) 
based on density matrix eigenvalues or state entropy es- 
timated by frequentist methods are expected to perform 
poorly for such sample sizes. 

A hypothesis of practical interest is the purity of the 
state. In this case, parameter values in the null hypoth- 
esis lie on the boundary of the maintained hypothesis. 
Hence, standard regularity conditions (the parameter 
value in the null should be an interior point of a com- 
pact set) that ensure asymptotic convergence of the null 
distributions of the t-statistic and the Wald statistic to 
standard normal and chi-squared distributions, respec- 
tively, fail to hold. Andrews [l^l provides general asymp- 
totic results for testing problems of this sort. He derives 
the asymptotic null and local alternative distributions 
of the test statistics under a set of high level conditions. 
Although the distributions are non-standard, the critical 
values can be obtained by simulation. 

The salient conclusion from the above examples is that 
the size and power of hypothesis tests fall substantially 
far from the asymptotically predicted values, and that 
the finite sample test statistic distributions are not nor- 
mal and chi-squared, as would be expected from asymp- 
totic theory. In finite samples of practical size, hypoth- 
esis testing performs rather poorly for parameters, p el- 
ements or eigenvalues, underscoring the inadequacy of 
the frequentist quantum state estimation approach for 
such sample sizes. 



VII. DISCUSSION AND EXTENSIONS 

In this paper we have examined the performance of 
frequentist estimators of the density matrix of a quan- 
tum system using quantum observations simulated using 
different measurement strategies. We provided numeri- 
cal techniques for likelihood optimization under multiple 
constraints that are robust across arbitrary spin-1/2 and 
spin-1 system density matrices. In addition, we have 
presented methodologies and prescriptions for hypoth- 
esis testing in the frequentist quantum framework, an 
essential requirement for optimal decision making and 
control. 

Performing inference in the frequentist framework, we 
find that the finite sample variances are signficantly 
larger than the asymptotic bounds (predicted by the 
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FIG. 11: Finite sample test statistic distributions, spin-1/2 
quantum system. (A) t-statistic distribution for null hypoth- 
esis ffo : Qi = Qi.a. The asymptotic t-statistic distribution 
is standard normal (superimposed). (B) Wald-statistic dis- 
tribution for null hypothesis //q : Qi — 9i,o, i = ^,"2, 3. The 
asymptotic Wald statistic distribution is chi-squared with de- 
grees of freedom equal to 3, the number of restrictions (su- 
perimposed) . 



FIG. 12: Finite sample test statistic distributions, spin-1 
quantum systems. (A) t-statistic distribution for null hy- 
pothesis Ho : Oe — 9a,o- The asymptotic t-statistic distribu- 
tion is standard normal (superimposed). (B) Wald-statistic 
distribution for null hypothesis Ho : Oi — 9ifi, i = 3,4,6. 
The asymptotic Wald statistic distribution is chi-squared 
with degrees of freedom equal to 3, the number of restric- 
tions (superimposed). 



Cramer-Rao theorem) for typical experimental sample 
sizes, and that these bounds are not approached at the 
rate predicted by asymptotic theory. This finding is ro- 
bust to the choice of the density matrix and is more 
pronounced for small datasets. A prior study [8] showed 
close correspondence between asymptotic and finite sam- 
ple performance in a single example, but did not con- 
sider the rate of convergence to the CRB, or higher- 
dimensional systems. 

A unique feature of quantum statistics is the quan- 
tum Cramer-Rao bound, a generalization of the classi- 
cal Cramer-Rao bound that originates due to the depen- 
dence of the quantum Fisher information on the mode of 
measurement. In prior work, considerable attention has 
been devoted to understanding the asymptotic relative 
efficiencies of different quantum measurement strategies. 
Our results warrant a careful re-examination of the rel- 
ative efficiencies of these measurement strategies in fi- 
nite samples. Recent studies [l3| have aimed to assess 
the resource requirements of various quantum tomogra- 
phy implementations employing different measurement 
strategies; future efforts along these lines would benefit 
from attention to finite sample losses and the rate at 



which asymptotic predictions are approached. 

Given that the finite sample variances are order(s) 
of magnitude bigger than the corresponding asymptotic 
ones, we conclude that in order to improve parameter 
estimates in finite samples, it is important to incorpo- 
rate additional information that exploits the information 
geometry of quantum states into the estimation proce- 
dure. An ideal approach in this regard is Bayesian esti- 
mation. Since it is based on updating a prior plausibil- 
ity distribution about the parameters based on observed 
data, Bayesian methods are generally more reliable than 
standard frequentist methods away from the asymptotic 
limit. The prior plausibility distribution permits the in- 
troduction of auxiliary information about the parame- 
ter space that is not contained within the likelihood. 
By contrast, such information is impossible to incorpo- 
rate in frequentist estimation techniques. Consequently, 
due to the incorporation of prior knowledge, the credible 
intervals produced by Bayesian inference are generally 
shorter than their asymptotic frequentist counterparts. 
Moreover, as we have seen, Bayesian credible intervals 
do not refer to the asymptotic limit of an infinite number 
of measurements. 
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APPENDIX A: AVERAGE-CASE SUBOPTIMAL These bases were generated through rotation of MUB 
MEASUREMENT BASES measurement bases according to the method described 



in Section IV C, with a = 1.2 in equation pT]) . 



The foUowing complete, nonoptimal measurement 
bases were used in Section IVI CI in order to assess the 
finite sample losses incurred due to not using mutually 
unbiased measurements for spin-1 systems : 

0.732 + 0.350i -0.078 - 0.223i -0.163 - 0.507i 
1/(1) =( -0.078 - 0.223i 0.705 - 0.649i -0.152 - 0.036i 
-0.163 - 0.507i -0.152 - 0.036i 0.460 - 0.693i 

0.571 + 0.016i -0.738 -0.337i -0.122 - 0.024i 
y(2) ^ I 0.074 + 0.144i -0.089 -O.OOK 0.790 + 0.584i 
0.796 -O.USi 0.563 + 0.130i -0.075 + 0.114i 

0.301 + 0.387i -0.095 - 0.697i -0.168 - 0.487i 
1/(3)^(0.538 + 0.003? 0.281 - 0.336i 0.197 + 0.693i 
0.674 + 0.124i 0.119 + 0.548i 0.264 - 0.382i 

-0.115 + 0.952i 0.009 + 0.214i -0.129 - 0.132i 
y^") = I 0.009 + 0.214i 0.290 - 0.391i 0.841 + 0.097i 
-0.129 - 0.132i 0.841 + 0.097i -0.156 - 0.474i 
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